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1.  Introduction 


The  purpose  of  this  study  is  to  examine  the  possibility  of  impiementing  an  iterative  algorithm  such  as 
the  conjugate  gradient  algorithm  in  an  optical  signal  processor.  This  research  is  an  extension  of  work 
done  as  part  of  the  Summer  Faculty  Research  Program  (SFRP)  in  1986  at  RADC,  Griffiss  AFB,  NY. 
The  period  of  performance  covered  by  this  report  is  March  1,  1987  to  March  31,  1988. 

The  SFRP  work  focused  on  a  prototype  acousto-optic  signal  processor  which  was  already  in 
experimental  operation  as  part  of  an  RADC  project  (see  [1,2]).  This  processor  uses  a  variation  of  the 
Least  Mean  Square  (LMS)  algorithm.  The  goal  of  the  current  project  is  to  investigate  more  powerful 
algorithms  such  as  conjugate  gradient  that  might  provide  improved  performance  for  such  a  processor. 

2.  Background  on  the  Problem 

2.1  The  Signal  Processing  Application 

The  particular  signal  processing  application  is  adaptive  noise  cancellation.  A  main  signal  is  received 
consisting  of  the  signal  of  interest  s(t)  plus  a  noise  signal  n(t).  Omni-directional  side  antennas  receive 
signals  rtj(t),  j=l,...,N.  A  weighted  combination  of  delayed  versions  of  these  side  signals  is  used  to 
estimate  the  noise  n(t).  We  denote  this  estimated  noise  by  y(t).  The  problem  is  to  determine  the 
optimum  combination  of  weights  in  order  to  minimize  the  difference  between  the  estimated  noise  and 
the  actual  noise. 

The  quantity  we  would  like  to  minimize  is 


E(l*(t)!2)  (2.1) 

where  e(t),  the  so  called  ’error  signal’,  is  the  difference  between  the  main  signal  plus  noise  s(t)  +  n(t) 
and  the  estimated  noise  y(t),  and  E  indicates  expected  value  over  all  time  with  respect  to  some 
probability  distribution.  In  practice,  rather  than  a  true  expected  value  over  all  time,  some  finite 
measure  or  summation  of  recent  signal  history  is  used. 

The  expression  (2.1)  can  be  thought  of  as  a  functional  (ie.,  real  valued  operator)  of  the  unknown 
weight  vector  w  used  to  form  the  estimated  noise.  It  is  well  known  [3]  that  the  minimization  of  this 
functional  is  equivalent  to  setting  its  gradient  equal  to  zero.  This  leads  to  the  linear  equation 

Aw(x)  =  b(x)  (2.2) 
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where  w(x)  is  the  unknown  weight  vector  function  evaluated  at  the  delay  point  x,  b  is  a  vector  function 
formed  from  the  side  signals  and  the  main  signal  plus  noise,  and  A  is  a  positive  definite  symmetric 
operator  corresponding  to  the  covariance  matrix  in  discrete  formulations  of  this  problem  (see  Appendix 
1  or  [4]  for  a  discussion  of  the  derivation  of  the  analog  version  of  this  problem). 

2.2  The  Least  Mean  Square  Approach 

The  formulation  of  the  quantities  A  and  b  in  equation  (2.2)  is  a  formidable  computational  task.  As  a 
result,  several  approaches  have  been  advanced  which  attempt  to  circumvent  this  difficulty.  One  of 
these,  the  least  mean  square  (LMS)  algorithm  (cf.,  [5]),  has  been  implemented  on  several  optical 
processors  ([1],  [6]),  including  the  one  under  consideration  here.  It  is  the  performance  of  this  algorithm 
that  we  would  like  to  improve  upon. 

Although  the  LMS  algorithm  is  usually  thought  of  as  an  approximation  of  more  complicated 
algorithms  for  minimizing  the  quantity  (2.1),  one  can  also  think  of  it  directly  as  an  algorithm  for 
minimizing  the  quantity 


|e(t)|2  (2.3) 

instead  of  minimizing  the  quantity  (2.1).  As  was  the  case  before,  this  minimization  problem  is 
equivalent  to  setting  a  certain  gradient  equal  to  zero.  In  the  case  of  a  single  side  signal,  the  gradient 
associated  with  (2.3)  is  proportional  to 


e(t)ni(t-x).  (2.4) 

This  gradient  expression  has  the  advantage  of  being  easy  to  compute.  In  particular,  it  does  not  involve 
the  calculation  of  a  covariance  matrix.  However,  the  expression  (2.3)  only  has  a  minimum  in  the  case 
when  (2.4)  is  zero.  This  can  happen  only  when  e(t)  is  zero.  But  e(t)  has  the  form 

s(t)  +  n(t)  -  y(t). 

We  hope  to  make  the  quantity  n(t)  -  y(t)  zero,  but  in  general  s(t)  is  not  zero,  and  so  e(t)  will  also  not 
be  zero  when  there  is  a  main  signal  present.  This  is  a  potential  problem  with  LMS  and  we  will 
consider  it  further  in  the  next  section. 

Iterative  processes  have  the  general  form 
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(2.5) 


wi+iM  =  w»W  +  ajPiW 

i  =  0,1,... 


where  w^(x)  is  the  i1*1  iterative  approximation  of  w(x),  Pj(x)  is  a  direction  vector  which  indicates  the 
direction  to  go  in  to  get  to  the  next  iterate  and  aj  is  the  scalar  stepsize  that  tells  how  far  to  go 

in  the  direction  p^. 


For  the  LMS  algorithm,  we  take  pj(x)  to  be  the  vector  given  by  (2.4)  with  t  =  iAt,  where  At  is  the 
time  increment  between  iterations.  The  stepsize  is  taken  to  be  a  sufficiently  small  fixed  scalar  a.  As 
discussed  in  Appendix  1,  it  is  possible  to  solve  the  LMS  iteration  process  directly  to  obtain 


wk(x)  =  a  £  einj(iAt  -  x), 
i=0 


(2.6) 


where  e^  =  e(iAt).  Letting  At  — »  0,  we  get  the  analog  version  of  (2.6),  namely 

t 

w(x)  =  a  /  e(s)nj(s-x)ds.  (2.7) 

0 

It  is  actually  this  solution,  and  not  the  iterative  version  of  LMS,  that  is  being  implemented  in  the 
optical  processors  discussed  in  [1]  and  [6j.  In  this  form,  LMS  is  not  a  true  iterative  algorithm.  Rather, 
it  represents  an  approximate  version  of  a  complete  solution  of  the  minimization  problem. 


The  advantage  of  LMS  is  the  ease  with  which  it  can  be  implemented  in  a  real  time  processor.  The 
flow  of  data  in  such  a  processor  is  uninterrupted  as  the  as  the  solution  is  continuously  updated.  This 
makes  it  particularly  appealing  for  use  in  an  optical  processor.  This  is  a  desirable  property  of  LMS 
that  we  should  try  to  retain.  Unfortunately,  there  are  problems  inherent  in  LMS  that  result  in  a 
degradation  of  performance  that  can  reach  unacceptable  levels. 


2.3  Problems  with  LMS 

As  mentioned  in  the  previous  section,  there  may  be  problems  associated  with  LMS  when  a  main  signal 
is  present  (ie.,  signal-to-noisc  ratio  (SNR)  greater  than  0).  We  can  observe  this  phenomenon  in  the 
following  numerical  example  (all  numerical  examples  for  this  report  were  produced  on  a  personal 
computer  using  Turbo- Pascal). 

Figure  2.1  shows  the  performance  of  a  numerical  simulation  of  the  LMS  method  in  a  case  when  the 
main  signal  s(t)  is  0.  The  signal  received  at  the  main  antenna  is  just  a  noise  signal  n(t)  which  we  are 
attempting  to  cancel.  In  this  example, 
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n(t)  =  sin(20fft). 


(2.3) 


The  side  antenna  signal  is  of  the  form 

n^(t)  =  sin(20)rt  +  0.1).  (2.9) 

There  are  30  delay  taps  spread  over  a  delay  aperture  of  0.3  sec.  The  fixed  stepsize  is  a  =  0.05.  As  we 
can  see  in  this  figure,  good  noise  cancellation  is  achieved  after  approximately  half  a  second. 

However,  when  even  a  small  main  signal  is  present,  performance  deteriorates  drastically.  Figure  2.2 
shows  the  effect  of  adding  a  main  signal  of  the  form 

s(t)  =  0.5  sin(30xt)  (2. 10) 

(so  that  the  SNR  is  0.5).  The  graph  shows 

n(t)  -  y(t)  (2.11) 

the  difference  between  actual  noise  and  estimated  noise.  As  one  can  see,  there  is  essentially  no  noise 
cancellation.  This  is  in  agreement  with  observed  experimental  results  [7],  citing  that  LMS  works  well 
in  "extremely  poor  SNR  environments”.  Indeed,  there  is  no  hope  of  it  working  otherwise! 

Why  should  this  be  the  case?  Recall  that 


e(t)=d(t)-y(t) 


where 


d(t)  =  s(t)  +  n(t) 

is  the  signal  received  at  the  main  antenna.  If  we  substitute  this  expression  for  e(t)  in  (2.4),  and  then 
use  (2.4)  as  the  direction  vector  pj  in  (2.5),  with  t  =  iAt,  we  obtain  the  following  form  for  LMS: 

wj+](x)  =  wjW  +  »(d(>At)  -  y(iAt))nj(iAt  -  x).  (2.12) 

Convergence  of  this  method  implies 
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Wi+1  ~  wi 

for  large  i,  which,  in  turn,  implies  that  the  second  term  on  the  right  side  of  (2.12)  must  converge  to  0. 
But  this  implies 


d(iAt)  -  y(iAt)  — *  0 


or,  equivalently, 


s(iAt)  +  n(iAt)  -  y(iAt)  — *  0. 

But  this  quantity  can  never  be  0  if  s(t)  is  independent  (uncorrelated)  of  n(t)  and  y(t)  (which  we  hope  is 
the  case  if  we  are  going  to  avoid  cancelling  the  main  signal!).  Thus,  LMS  is  trying  to  annihilate  a 
quantity  that  can  never  be  2ero. 

To  put  this  another  way,  in  the  case  when  s(t)  is  not  identically  zero,  the  quantity  (2.3)  has  no 
minimum  weight  associated  with  it.  LMS  is  trying  to  solve  a  problem  that  has  no  solution.  The 
method  which  we  introduce  in  the  next  section  not  only  has  better  performance  characteristics  than 
LMS,  but  also  completely  avoids  this  serious  drawback  of  LMS  as  a  noise  cancellation  algorithm  in  the 
presence  of  a  main  signal. 
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3.  Nonstationary  Iterative  Methods 

3.1  New  Approach  to  Iteration 

We  now  consider  a  new  way  of  incorporating  iterative  algorithms  in  a  real  time  signal  processing 
environment.  The  motivation  for  the  approach  is  optical  signal  processing,  which  allows  us  the 
computational  speed  to  consider  such  an  approach.  The  uniqueness  of  the  method  lies  in  the  fact  that 
the  data  flow  is  allowed  to  drive  the  iterations,  providing  effective  real  time  performance.  Rather  than 
perform  multiple  iterations  on  a  fixed  problem,  which  must  be  formulated  from  stored  data,  we  allow 
variations  in  the  incoming  data  to  continuously  update  the  problem  while  iterations  are  being 
performed.  This  is  well  suited  to  optical  processing,  where  data  storage  and  retrieval  can  be  a  problem, 
but  computational  speed  is  not.  The  result  is  an  adaptive  process  that  can  significantly  outperform  the 
traditional  LMS  algorithm. 

In  contrast  to  the  LMS  algorithm,  the  new  iterative  technique  deals  with  equation  (2.2)  directly,  rather 
than  an  approximation  of  that  equation.  To  illustrate  the  technique,  we  consider  the  simplest  type  of 
iterative  algorithm  of  the  form  (2.5),  namely  the  steepest  descent  algorithm  with  fixed  stepsize.  This 
algorithm  has  the  form 

wn+l  =  wn+arn  (3.1) 

r„  =  b  -  A  w„. 

The  fixed  scalar  a  is  the  stepsize.  The  sequence  {wn}  constructed  from  (3.1)  will  converge  to  the 
solution  w*  of  (2.2)  provided 

I 

a  <  1/M 


where  M  is  the  largest  eigenvalue  of  A  (cf.  [8]). 

The  usual  approach  in  implementing  an  algorithm  such  as  (3.1)  is  to  compute  A  and  b  from  the  input 
data  once,  and  then  to  regard  then  as  fixed  while  the  iterations  are  being  performed.  However,  for  our 
real  time  acousto-optic  processor,  it  is  easier  to  recompute  A  and  b  on  every  iteration,  rather  than  to 
store  and  retrieve  their  values.  This  rccomputation  of  A  and  b,  however,  introduces  variations  in  their 
values  as  the  iterations  are  being  performed.  Thus,  it  is  more  appropriate  to  write  the  algorithm  (3.1) 
in  the  form 


wn+l  =  wn  +  «  fn  (3.2) 

rn  =  bn  -  Anwn 
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where  An  and  bn  are  the  updated  versions  of  A  and  b  at  the  n1*1  iteration. 

The  algorithm  (-V2)  is  an  example  of  a  nonstationarv  iterative  process  as  defined  for  example  in  [9],  In 
practice,  one  finds  that  An  and  bn  do  in  fact  change  on  every  iteration.  What  remains  the  same, 
however,  is  that  the  sequence  of  problems 

An  w  =  bn,  n  =  0,1,2,...  (3-3) 

all  have  the  same  solution  w*  for  each  value  of  n  (or,  at  least,  w»  changes  slowly  in  time  compared  to 
the  speed  of  the  iteration  process). 

This  makes  sense  in  the  context  of  our  noise  cancellation  problem.  Recall  that  the  weight  vector 
solution  w*  represents  which  of  the  delayed  versions  of  the  side  signal  are  to  be  weighted.  This  is  not 
going  to  change  from  one  iteration  to  the  next.  Thus,  the  solution  does  not  change,  even  though  the 
formulated  problem  changes  from  one  iteration  to  the  next. 

When  the  solution  does  change  over  time,  this  type  of  process  will  adapt  to  the  new  solution  since  we 
are  always  incorporating  the  most  recent  signal  data.  Moreover,  convergence  to  the  new  solution  value 
should  be  very  quick  since  the  old  solution  value  provides  a  good  starting  point  from  which  the 
iteration  process  can  seek  the  new  solution. 

Other  iterative  algorithms  can  also  be  put  in  nonstationary  form.  One  improvement  on  the  steepest 
descent  algorithm  is  to  optimize  the  stepsize  at  each  iteration  step.  The  nonstationary  version  of  this 
algorithm  has  the  form 


wn+l  =  wn  +  anrn 

rn  =  bn  -  Anwn  (3-4 ) 

an  =  (rn>rn)/(rn*Anrn)- 


The  nonstationary  conjugate  gradient  algorithm  takes  the  form 
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(3.5) 


wn+l  =  wn  +  anPn 

pn+l  =  rn+l  ‘  CnPn 
an  =  (rn>Pn)/(Pn>AnPn) 
cn  =  (rn+i»Anpn)/(pn.Anpn) 
rn  =  ^>n  ‘  Anwn- 

Here,  an  and  cn  are  scalars,  (  ,  )  indicates  inner  product,  and  pn  is  the  direction  vector.  In  the  next 
section,  we  show  numerically  that  sequences  {wn}  generated  from  either  (3.2),  (3.4)  or  (3.5)  will 
converge  to  the  common  solution  w*  of  the  sequence  of  problems  (3.3).  In  section  4.1  we  look  at 
analytical  results  concerning  the  convergence  of  such  sequences  to  the  desired  solution  w*. 

3.2  Numerical  Results 

This  section  presents  the  results  of  three  numerical  simulations  comparing  the  performance  of  several 
nonstationary  iterative  algorithms  and  the  traditional  LMS  algorithm.  As  mentioned  previously,  all 
numerical  results  were  produced  on  a  personal  computer.  In  order  to  be  computationally  feasible  on 
such  a  computer,  the  examples  are  constructed  so  that  an  exact  solution  is  possible  with  a  relatively 
small  number  of  tap  weights  (we  choose  6  tap  weights  for  the  iterative  algorithms  and  30  for  LMS).  In 
order  to  study  the  behavior  and  stability  of  the  methods  for  larger  numbers  of  tap  weights,  more 
computer  power  will  be  needed.  For  an  optical  processor,  however,  large  numbers  of  tap  weights  will 
present  no  computational  difficulty. 

EXAMPLE  1:  For  the  first  example,  we  have  no  main  signal,  so  that  SNR  =  0.  The  noise  signal  to 
be  cancelled  is 


n(t)  =  sin(20xt  +  50t“),  0  <  t  <  1. 

The  graph  of  n(t)  is  shown  in  Figure  3.1  (a).  A  single  side  antenna  receives  a  copy  of  the  noise  signal 
in  the  form 


n^(t)  =  n(t  +  Q.l). 

Delayed  versions  of  this  side  antenna  signal  are  formed  over  a  total  delay  aperture  of  R  =  0.3. 

Figure  3.1  (b)  shows  the  results  for  the  LMS  algorithm.  The  algorithm  is  run  with  a  fixed  stepsize  of 
0.1  and  30  delay  taps.  200  iterations  arc  used  over  a  lime  interval  from  t  =  0.35  to  t  =  1.3  (ic.,  at 
each  iteration  the  current  time  is  updated  by  a  time  increment  of  At  =  (1.3  -  0.35)/200.  The  graph 
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Amplitude 


shows  the  difference  between  actual  noise  and  estimated  noise.  We  observe  that  this  output  noise 
settles  down  to  a  signal  of  amplitude  0-05,  although  the  algorithm  does  display  some  problems  near 
t=l,  where  the  noise  signal  becomes  compressed  (higher  frequency). 


Figures  3.1(c)-(e)  show  numerical  results  for,  respectively,  the  nonstationary  steepest  descent,  with 
fixed  and  optimized  stepsize,  algorithm  and  conjugate  gradient  algorithm.  The  number  of  delay  taps 
used  is  6,  so  that  the  covariance  matrices  An  are  6X6,  and  the  vectors  bn  have  6  components.  The 
entries  in  An  and  bn  are,  respectively,  auto-correlation  and  cross-correlation  functions,  which  are 
computed  using  integration  over  time.  Theoretically,  this  integration  should  be  performed  over  the 
time  interval  -oo  to  oo.  However,  in  practice  this  integration  can  only  be  done  over  a  finite  interval. 
We  choose  the  interval  from  tg  -  3  to  tg,  where  tg  is  current  time.  The  integration  is  performed 
numerically  in  the  simulations  using  a  200  point  Simpson’s  rule. 

For  these  nonstationary  algorithms,  the  values  of  An  and  bn  are  recomputed  on  every  iteration.  The 
simulations  are  run  from  time  t  =  0.35  to  t  =  1.3.  At  each  iteration,  the  current  time  is  updated  by 
an  amount  At  =  (1.3-0.35)/(#  iterations). 

From  Figures  3-l(c)-(e),  one  can  see  that  in  this  example  the  nonstationary  iterative  algorithms 
provide  a  significant  improvement  in  performance  over  the  LMS  algorithm.  The  complexity  of  the 
noise  signal  causes  no  difficulties  for  these  algorithms.  Not  surprisingly,  the  best  performance  is 
obtained  from  the  conjugate  gradient  algorithm,  computationally  the  most  complex  of  the  algorithms. 

EXAMPLE  2:  This  is  another  example  with  SNR  =  0.  We  consider  a  noise  signal,  shown  in  Figure 
3.2(a),  of  the  form 


sin(50xt) 
n(t)  =  <  sin(100»t) 
sin(50!rt) 


0  <  t  <  0.5 
0.5  <  t  <  1.0 
1.0  <  t  <  1.5. 


A  side  antenna  receives  a  signal  of  the  form 

n^(t)  =  n(t  +  0.1). 

This  particular  noise  signal  was  chosen  to  provide  another  example  where  the  LMS  algorithm  has 
apparent  difficulty. 
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The  LMS  algorithm  was  run  with  a  stepsize  of  0.000001,  with  ail  other  parameters  being  the  same  as  in 
the  previous  example.  Figure  3.2(b)  shows  the  output  of  this  algorithm.  As  one  can  see,  there  is 
essentially  no  noise  cancellation.  Larger  numbers  of  iterations,  and  larger  and  smaller  stepsizes 
produced  no  better  results. 

Figures  3.2(c)-(d)  show  the  results  for  nonstationary  steepest  descent  and  nonstationary  conjugate 
gradient  algorithms.  The  noise  cancellation  is  similar  to  the  previous  example,  and  is  much  better 
than  LMS. 

EXAMPLE  3:  For  our  final  example,  we  revisit  the  problem  considered  in  Section  2.3.  Recall  that  the 
LMS  algorithm  did  not  work  at  all  in  the  presence  of  a  main  signal.  Figures  3.3  (a)-(b)  show  the 
results  of  applying  the  nonstationary  steepest  descent  with  fixed  stepsize  and  nonstationary  conjugate 
gradient  algorithms  to  the  same  problem.  The  noise  signal  is  defined  by  (2.8),  with  side  signal  given 
by  (2.9)  and  main  signal  given  by  (2.10).  As  one  can  see  from  the  figures,  the  performance  of  these 
algorithms  is  not  affected  by  the  presence  of  a  main  signal.  Figures  3.3(c)-(d)  show  the  effect  of  an 
even  larger  SNR  of  10.  The  steepest  descent  algorithm  remains  unaffected,  while  there  is  some 
deterioration  in  the  performance  of  the  conjugate  gradient  algorithm.  It  is  believed  that  this  is  due  to 
the  effect  of  the  large  magnitude  of  s(t)  on  the  numerical  integration  scheme,  and  not  due  to  the 
conjugate  gradient  algorithm  itself.  In  this  example,  apparently  conjugate  gradient  is  more  sensitive 
than  steepest  descent  to  errors  in  the  computation  of  An  and  bn.  This  is  not  believed  to  generally  be 
the  case. 

What  these  examples  show  is  that  there  are  situations  where  LMS  does  not  work  at  all  as  a  noise 
cancellation  algorithm.  We  have  shown  that  nonstationary  iterative  algorithms  will  work  in  these 
same  situations.  Since  these  simulations  were  run  on  a  PC,  the  examples  had  to  be  set  up  so  that  a 
solution  could  be  attained  with  a  small  number  of  tap  weights  (6).  The  performance  of  these 
nonstationary  algorithms  should  be  investigated  on  larger  computers  using  a  greater  number  of  tap 
weights.  Matrix  pre-conditioning  techniques  may  be  necessary  in  this  case  to  deal  with  possible  ill 
conditioning  effects. 

4.  Analysis 

Not  much  is  available  in  the  literature  concerning  analysis  results  for  nonstationary  iterative  processes 
of  the  type  we  are  considering  here.  This  is  not  surprising,  since,  without  optical  processing,  such  a 
process  presents  a  formidable  computational  task.  The  next  section  contains  a  convergence  proof  for 
the  nonstationary  steepest  descent  algorithm.  In  Section  4.2,  convergence  results  are  combined  with 
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(*)  NONSTATIOARY  STEEP.  DESC. 
WITH  SNR  =  0.5 


(b)  NONSTATIONARY  CONJ.  CRAD. 
WITH  SNR  =  0.5 


(c)  NONSTATIONARY  STEEP.  DESC. 
WITH  SNR  =  10 


(d)  NONSTATIONARY  CONJ.  CRAD. 
WITH  SNR  =  10 


FICURE  3.3  EXAMPLE  3 


perturbation  results  to  produce  an  error  analysis  for  tKj  .  ’’•'rithm. 

4.1  Nonstationary  Convergence  Results 

The  situation  we  are  considering  is  as  follosvs.  We  have  a  sequence  (Ak)  of  positive  definite  symmetric 
linear  operators  (for  example,  covariance  matrices)  and  a  sequence  {b^}  of  vectors  such  that  the 
equations 


Akw  =  bk,  k  =  0,1,2,...  (4.1) 

have  a  common  solution  w*.  Let  the  scalar  a  be  such  that 

II  I  -  *  Ak  ||  <  €  <  1  (4.2) 

for  each  k  =  0,1,2,...,  and  some  £  <  1.  The  operator  I  is  the  identity  operator.  This  is  not  an 
unreasonable  condition  since  a  similar  condition  is  necessary  for  convergence  of  the  normal  steepest 
descent  process  [10].  We  then  have  that  the  sequence  {wk}  generated  by  the  process 

wk+l  =  wk  +  a  (bk  ‘  Akwfc)’  k  =  O'1’2*-  (4-3) 


converges  in  norm  to  w*. 

To  prove  this,  note  in  the  following  that 


bk  '  Akw*  =  0 


so  that  we  have 


II  wk+l  '  w*  II  =  II  wk  +  a(bk  '  Akwk^  *  w*  II 

=  ||  w,.  -  w.  +  a(bk-Akwk)  -  a(bk-Afcw*)  )j 
=  II  wfe  -  w.  -  aAk(wk  -  w.)  || 

=  II  (I  -  aAfcKwk  '  w*)  II 

<  II  I  '  aAk  ||  ||  wk  -  w.  || 
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<  (  II  wk  -  w*  II 


<^k+1||  w0  -  w*  ||. 

Since  (  <  1,  this  last  term  — *  0  as  k  — *  co.  This  completes  the  proof. 

As  a  corollary,  we  note  that  it  suffices  to  replace  condition  (4.2)  with 

I!  I  -  aAk  ||  =  (k  <  1  (4.4) 

for  each  k.  We  then  find  that 

II  wk+l  ‘  w*  II  <  ^  JI  j  H  w0  '  W*H 

and  the  term  on  the  right  side  also  — *  0  as  k  — ►  oo  since  each  factor  in  the  product  is  <  1.  A 
sufficient  condition  for  satisfying  (4.4)  is 

a  <  (4.5) 


where  is  the  smallest  eigenvalue  of  A^. 

In  [II],  convergence  results  are  given  for  the  case  when  the  sequence  of  operators  {A^}  satisfies 
A^  -*  A  for  some  fixed  operator  A.  However,  the  situation  considered  here,  namely  that  the  equations 
(4.1)  have  a  common  solution,  seems  to  better  reflect  what  would  happen  in  practice.  Table  4.1  shows 
data  taken  at  three  different  time  steps  during  one  of  the  simulation  runs  discussed  in  the  previous 
section.  The  three  matrices  shown  here  are  obviously  very  different.  What  is  the  same  is  the  solution 
w  =  (0,0, 1,0, 0,0)  to  the  three  linear  equations  represented  by  these  matrices  and  vectors. 

4.2  Error  Analysis 

In  [12],  a  nonstationary  perturbation  analysis  is  given  for  the  stationary  steepest  descent  algorithm. 
That  is,  the  fixed  problem 


Aw  =  b 


is  solved  using  the  usual  steepest  descent  algorithm,  and  the  effects  of  different  perturbations 
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Covariance  Matrix  l: 


0.100725 
-0.003987 
-0.018533 
“0.002036 
-0.01 1653 
0.003697 


-0.003987 

0.092327 

-0.012011 

-0.023480 

-0.000029 

-0.003673 


-0.018533 

-0.012011 

0.083455 

“0.019583 

“0.024551 

0.007741 


“0.002036 

“0.023480 

-0.019583 

0.074181 

-0.024853 

“0.019566 


“0.01 1653 
-0.000029 
-0.024551 
-0.024853 
0.066859 
-0.026961 


0.003697 

-0.003673 

0.007741 

-0.019566 

“0.026961 

0.058625 


b-Vec  tor  1 : 

-0.01853  -0.01201  0.08345  -0.01958  -0.02455  0.00774 

Solution  1 : 

-0.00000  0.00000  1.00000  0.00000  0.00000  0.00000 


Covariance  Matrix  2: 


0.133363 
0.125000 
0.1  16637 
0.108363 
0.100000 
0.091637 


-0.125000 
0.125000 
-0.1 16637 
0.108363 
-0.100000 
0.091637 


0.116637 
-0.1 16637 
0.1 16637 
-0.108363 
0.100000 
-0.091637 


-0.108363 
0.108363 
-0. 108363 
0.108363 
-0.100000 
0.091637 


0.100000 

-0.100000 

0.100000 

-0.100000 

0.100000 

“0.091637 


-0.091637 

0.091637 

-0.091637 

0.091637 

“0.091637 

0.091637 


b-Vector  2: 

0.11664  — 0 . T 1 664  0.11664  -0.10836  0.10000  -0.09164 


Solution  2: 

-0.00000  0.00000  1.00000  -0.00000  0.00000  -0.00000 


Covariance  Matrix  3: 


0.167037 

0.037466 

-0.009107 

-0.002472 

-0.013037 

-0.000387 


0.037466 

0.157761 

0.036683 

0.000259 

0.003787 

0.013928 


-0.009107 

0.036683 

0.150458 

0.036188 

0.006093 

0.010720 


-0.002472 
-0.000259 
0.036188 
0.141609 
0.031 1 16 
0.006366 


-0.013037 
0.003787 
0.006093 
0.031116 
0. 132684 
0.026026 


-0.000387 

-0.013928 

0.010720 

0.008366 

0.026026 

0.125200 


b-Vector  3: 

-0.00911  0.03668  0.15046  0.03619  0.00609  0.01072 

Solution  3: 

-0.00000  -0.00000  1.00000  0.00000  -0.00000  0.00000 


TABLE  4.1  THREE  DIFFERENT  MATRIX  PROBLEMS  WITH  SAME  SOLUTION 
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introduced  at  each  step  of  the  iteration  process  are  studied.  Thus,  at  the  step,  instead  of  having 
exactly  A  and  b  available,  we  assume  that  we  are  dealing  with  perturbed  versions  of  these  quantities: 

A  +  Mn 


The  analysis  provides  a  bound  for  the  difference  between  the  perturbed  iterates  wn  and  the  normal 
unperturbed  iterates  wQ. 

In  this  section  we  apply  these  ideas  to  the  nonstationary  steepest  descent  process.  As  before,  we 
consider  a  sequence  of  problems 


Anw  =  bn,  n  =  0,1,2,...  (4.6) 

with  common  solution  w*.  We  now  introduce  perturbations  S An  and  £bn  at  each  iteration  step,  so 
that  we  obtain  a  sequence  of  perturbed  problems  of  the  form 

Anw  =  b„ 


where 

An  —  Ajj  +  iAj 
bn  =  bn  +  ^bn. 

This  is  a  particularly  important  problem  to  consider  in  the  context  of  optical  implementation,  since  we 
can  expect  errors  in  the  formulation  of  A  and  b  at  each  iteration  step.  We  now  determine  the  effect  of 
these  errors. 

The  nonstationary  steepest  descent  algorithm  applied  to  the  sequence  of  problems  (4.6)  has  the  form 

*n+l  =  +  ^n  *  *n*n).  "  =  0,1,2,...  (4.7) 

We  assume  that  the  stepsize  a  has  been  chosen  to  satisfy  the  condition  (4.2).  The  nonstationary 
process  generates  a  sequence  (wn).  From  a  practical  point  of  view,  what  we  would  like  to  know  is:  for 
large  n,  how  far  off  is  the  perturbed  iterate  wn  from  the  true  solution  w*  of  the  unperturbed  system 
(4.1)? 
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We  answer  this  question  in  several  stages.  First,  we  determine  the  maximum  difference  between  the 
perturbed  iterate  wn  and  the  corresponding  iterate  wn  from  the  unperturbed  process  (4.3).  The 
analysis  here  is  very  similar  to  that  given  in  [12],  so  we  only  sketch  the  details. 


Define 


<5wn  =  wn  -  wn,  n  =  0,1,2,... 


Subtracting  equation  (4.3)  from  (4.7),  we  find  that  iwn  satisfies  a  nonhomogeneous  difference  equation 
of  the  form 


where 


*wn+l  =  (!  ‘  a^n)*wn  +  a  gn,  n  =  0,1,2,. 


gn  =  *bn  -  *Anwn. 


(4.8) 


Equation  (4.8)  can  be  solved  directly  to  obtain 


6  w 


n+1 


=  £  {  ft  O-aAj)  jagk 

k=0  lj=k+l  J  J 


(see  [12]  for  the  precise  meaning  of  the  noncommutative  product  of  operators  on  the  right).  Thus, 

II  *wn+1  H  ^  a  ^  J]  |  n  i  I!1 '  aAnllj  j  (max  ||gk||  ).  (4.9) 


Denote 


a  =  sup  ||  $An  || 
n 


0  =  sup  ||  <$bn  || 
n 


W  =  sup  II  wn  ||. 
n 


a  and  0  are  finite  by  assumption  and  W  is  finite  since  we  assume  {wn}  is  a  convergent  sequence. 
Then 
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max  ||g.  ||  <  0  +  aW. 
k<n  * 


||I  -  aAjlj  <  ||I  -  aAj||  +  aa 


<  S  +  aa 


so  that 


t{  ft  £{€  —  }*. 

k=0  lj=k+l  J  k=0 


(4.10) 


The  right  side  of  (4.10)  converges  to 


1  -  (£  +  *a) 


as  n  — ►  oo,  provided 


i  +  aa  <  1. 


(4.11) 


Thus,  in  this  case  we  have  from  (4.9), 


T 11  11  s  i  ( f  +  “w  )• 


(4.12) 


If  we  define  mj  as  the  smallest  eigenvalue  of  Aj  and  let  m  equal  the  infimum  of  the  sequence  {mj}  then 


If  we  assume  m  >  0  and  take 


the  condition  (4.11)  becomes 


||I  -  aAj||  =  1  -  anrij  <  1  -  am. 


(  =  1  -  am  <  1 


a  <  m 


and  the  bound  (4.12)  can  be  written  as 


sup  ||  wn  -  wn 
n 


<  m^a(^  +  “W> 


(4.13) 
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This  implies  that  the  perturbed  process  (4.7)  could  become  unstable  if  the  perturbations  on  An  exceed 


m. 


VVe  can  now  estimate  the  difference  between  \vn  and  w*: 

II  *n  '  w*  II  <  II  *n  -  wn  II  +  II  wn  *  w*  II- 


For  sufficiently  large  n,  given  c  >  0  we  can  write 

II  *n  -  »•  II  <  fTT^a  (  P  +  *W  )  +  <•  (4.14) 


This  is  the  desired  result  providing  the  distance  of  the  perturbed  iterates  w„  from  the  true  solution  w*. 
Not  surprisingly,  this  distance  depends  on  the  size  of  the  errors  a  and  0,  and  this  distance  can  blow  up 
very  quickly  if  a  is  close  to  m. 


For  the  case  of  a  single  fixed  equation,  error  bounds  analogous  to  (4.14)  are  frequently  written  in  terms 
of  the  condition  number  of  a  matrix.  VVe  can  obtain  a  similar  result  here,  if  we  agree  to  define  the 
"condition  number  of  the  sequence  {An}”  to  be  the  quantity 


r  = 


M 

m 


where  M  is  the  supremum  of  the  sequence  {Mn},  where  Mn  is  the  maximum  eigenvalue  of  An.  We 
assume  M  is  finite,  as  well  as  the  quantity 


B  =  sup  |1  bn  ||. 
n 

Then  from  (4.14)  we  get  a  bound  for  the  relative  error: 

II  *»  -  "•  ll/W  <  ^(b  +  fi> 

Thus  the  relative  error  in  w„  is  proportional  to  the  relative  errors  in  (An)  and  (bn).  The  constant  of 
proportionality  is  dependent  on  T,  the  condition  number  of  the  sequence  {An},  as  well  as  the 
proximity  of  a  to  m.  Once  again  we  observe  that  the  process  can  become  unstable  if  the  perturbations 
(a)  on  An  exceed  the  smallest  (m)  of  the  eigenvalues  of  all  the  An. 


5.  Optical  Systems 

In  this  section  we  consider  two  approaches  for  possible  optical  implementation  of  the  nonstationary 
iterative  algorithms  discussed  in  the  previous  sections.  The  first  of  these  is  a  hybrid  system  that  would 
use  optics  to  do  the  bulk  of  the  computational  effort  and  an  electronic  microprocessor  to  perform  the 
actual  algorithm  iteration  step.  This  approach  allows  some  flexibility  in  the  choice  of  the  algorithm, 
although  its  performance  would  be  limited  by  the  optics  to  electronics  conversion.  The  second 
approach  is  an  all  optical  implementation  of  the  nonstationary  steepest  descent  with  fixed  stepsize 
algorithm.  This  processor  would  be  able  to  run  just  the  one  algorithm.  It  would,  however,  be  an 
important  step  toward  realizing  all-optical  implementations  of  the  other  algorithms,  such  as  conjugate 
gradient,  and  it  would  provide  real  time  performance. 

5.1  Hybrid  System 

The  first  approach  we  consider  is  an  electro-optic  hybrid  system.  This  system  will  use  optics  to  do  the 
hard  computational  task  of  computing  the  covariance  matrix  An  and  the  vector  bn  on  every  iteration. 
These  computations  involve  correlations  and  integrations  which  can  be  easily  accomplished  optically. 
The  iteration  step  of  algorithms  such  as  (3.4)  and  (3.5),  however,  involve  scalar  division  which  cannot 
easily  be  done  in  the  optics  domain.  An  electronic  microprocessor  will  be  used  to  perform  this  step. 
The  use  of  a  programmable  microprocessor  here  will  also  allow  the  testing  and  comparison  of  different 
algorithms  in  a  real  signal  environment.  The  division  of  tasks  between  optics  and  electronics  in  this 
hybrid  processor  is  shown  in  Figure  5.1. 

An  overview  of  the  hybrid  system  is  shown  in  Figure  5.2.  A  single  side  signal  n^(t)  will  pass  through  a 
tapped  delay  line  and  drive  an  array  of  light  emitting  diodes  (LED’s).  The  LED’s  are  a  low  cost 
alternative  to  a  laser  system.  Also,  unlike  lasers,  the  LED's  have  linear  characteristics  over  a  broad 
range  in  converting  the  input  electrical  signal  into  light,  and  their  incoherent  nature  frees  the  system 
from  speckle  (coherent  noise)  present  in  lasers. 

The  LED’s  will  illuminate  an  acousto-optic  (AO)  spatial  light  modulator.  Figure  5  3  shows  the  details 
of  the  optics.  The  AO  cell  will  simultaneously  be  driven  by  the  same  side  signal  n  j(t),  so  that  delayed 
versions  of  that  signal  will  be  spread  across  the  cell  aperture.  This  aperture  should  be  wide  enough  to 
produce  sufficient  delay  (about  40  p-sec)  in  the  side  signal.  This  use  of  AO  cells  to  produce  delayed 
signals  is  similar  to  the  techniques  used  in  the  optical  signal  processors  of  [1]  and  [6). 

The  LED’s  produce  a  vector  whose  components  are  delayed  versions  of  the  side  signal  nj(t).  The  same 
vector  is  represented  in  the  crystal  aperture  of  the  AO  cell.  The  result  of  illuminating  this  aperture 
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FIGURE  5.2  HYBRID  SYSTEM  OVERVIEW 
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C)  SIDE  view 


FIGURE  5.3  DETAILS  OF  OPTICS  FOR  COMPUTING  ApJl(t) 
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with  the  LED's  is  the  outer  product  of  these  vectors,  which  is  a  matrix.  This  matrix  is  collected  and 
time  integrated  by  a  2-dimensional  time  integrating  charge  coupled  device  (CCD)  detector  array.  The 
output  of  the  detector  array  is  the  covariance  matrix  An.  A  frame  grabber  will  send  the  matrix  data 
to  the  digital  microprocessor. 

To  construct  the  vector  bn  at  each  time  step,  a  single  LED,  driven  by  the  main  signal  plus  noise,  s(t) 
+  n(t),  illuminates  an  AO  cell  which  is  simultaneously  being  driven  by  the  side  signal  n^(t).  The 
resulting  modulated  light  represents  a  vector  whose  components  are  the  product  of  s(t)  -f  n(t)  with 
delayed  versions  of  n^(t).  This  light  is  collected  onto  a  one  dimensional  CCD  time  integrating  detector 
array.  The  output  of  this  detector  array  is  bn,  which  is  sent  to  the  microprocessor  via  an  analog  to 
digital  (A/D)  converter  board. 

The  main  signal  plus  noise,  s(t)  +  n(t),  as  well  as  the  delayed  versions  of  the  side  signal  n^(t)  from  the 
tapped  delay  lines,  are  also  sent  through  the  A/D  board  to  the  microprocessor.  The  iteration  step  and 
the  actual  noise  cancellation  will  be  performed  in  the  digital  signal  domain  within  the  microprocessor. 

Such  a  hybrid  system  should  be  viewed  as  a  low  cost  proof  of  principle  device  that  could  validate  this 
class  of  nonstationary  iterative  processes  for  signal  processing  applications.  The  A/D  conversion  would 
limit  its  usefulness  as  a  real  time  processor. 

5.2  All-Optical  System 

Figure  5.4  shows  a  simplified  system  diagram  for  a  possible  optical  implementation  of  the 
nonstationary  steepest  descent  with  fixed  stepsize  algorithm.  Only  one  side  signal  is  shown,  although 
multiple  side  signals  could  be  handled  with  a  multi-channel  AO  cell. 

AO  cells  are  used  to  produce  a  continuum  of  delayed  versions  of  the  side  signals,  and  to  form  products 
of  these  delayed  signals  with  other  quantities.  A  lens  performs  spatial  integration.  Liquid  crystal  light 
valves  (LCLV)  perform  time  integration.  The  weight  vector  w  is  computed  in  the  optic  domain,  and 
the  output  of  the  system  is  an  optical  representation  of  the  estimated  noise  signal  y(t).  This  will  be 
converted  by  a  detector  to  the  electronic  domain  where  it  will  be  recombined  with  the  main  signal  plus 
noise,  s(t)  +  n(t),  to  produce  the  final  system  output,  namely 

s(t)  +  n(t)  -  y(t). 

This  signal  should  be  close  to  the  main  signal  s(t). 

While  the  nonstationary  steepest  descent  algorithm  is  not  the  most  powerful  we  have  considered  here, 
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it  is  the  simplest  to  implement  optically.  A  realization  of  the  optical  implementation  for  this 
algorithm  would  be  an  important  step  toward  opening  up  this  whole  class  of  nonstationary  iterative 
algorithms  to  optical  implementation. 

6.  Concluding  Remarks  and  Recommendations 

6.1  The  State  of  the  Art 

Optical  signal  processors  have  already  been  built  which  can  perform  adaptive  noise  cancellation  ([1] 
and  [6]),  and  optical  processors  implementing  iterative  algorithms  such  as  steepest  descent  and 
conjugate  gradient  have  also  been  built,  or  at  least  proposed  ([13]  and  [14]).  So  what  is  new  about  the 
approach  that  is  being  presented  here?  The  optical  processors  that  actually  take  in  signal  data  and 
perform  adaptive  noise  cancellation  in  real  time  are  implementing  some  version  of  the  LMS  algorithm, 
and  thus  suffer  from  the  performance  limitations  of  that  algorithm.  The  optical  processors  which  use 
true  iterative  algorithms  such  as  steepest  descent  do  so  on  fixed  matrix  data,  in  the  form  of  some  type 
of  mask.  Thus  they  are  not  true  real  time  signal  processors,  ie.,  they  cannot  formulate  the  matrix 
problem  in  real  time  and  solve  it.  The  approach  we  are  advancing  here  does  propose  to  formulate  the 
problem  in  real  time  and  solve  it  with  the  performance  advantages  of  iterative  algorithms. 

6.2  Recommendations 

Nonstationary  iterative  algorithms  can  provide  significant  advantages  over  LMS  for  adaptive  noise 
cancellation.  Optical  processing  will  be  necessary  to  implement  these  algorithms  in  a  real  time 
environment  because  of  the  computational  load.  These  algorithms  are  good  candidates  for  optical 
implementation  because  they  take  advantage  of  the  power  of  optics,  rather  than  just  mimic  what  is 
already  being  done  electronically.  The  technology  is  here  now  to  realize  optically  the  simplest  of  these 
algorithms,  namely  nonstationary  steepest  descent  with  fixed  stepsize.  The  means  to  do  this  was 
outlined  in  the  previous  section.  A  successful  optical  implementation  of  this  algorithm  would  open  this 
whole  class  of  algorithms  to  optics.  The  numerical  examples  of  section  3  show  the  potential 
improvement  possible  through  the  use  of  the  conjugate  gradient  algorithm.  This  algorithm  could  be 
implemented  optically  if  a  means  can  be  found  to  accomplish  scalar  multiplication  and  division  in  the 
all-optic  domain.  The  hybrid  processor  discussed  in  the  previous  section  provides  a  means  of  validating 
this  entire  class  of  algorithms  for  signal  processing  applications.  If  improvements  can  be  made  in  A/D 
conversion,  such  a  processor  could  find  practical  use. 

Finally,  the  ultimate  goal  of  optical  computing  in  signal  processing  applications  should  be  to  produce 
an  optical  processor  using  integrated  optics,  or  perhaps  some  three  dimensional  analog  of  integrated 
optics  (three  dimensional  wave  guides  have  already  been  developed).  Acousto-optic  cells,  lenses,  lasers, 
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delay  lines,  and  detectors  have  all  been  fabricated  in  integrated  optics  devices,  with  the  technology  for 
spatial  light  modulators  lagging  somewhat  behind.  When  integrated  optics  technology  matures  we  can 
hope  to  bring  optical  computing  techniques  out  of  the  laboratory  and  into  the  Held  in  the  form  of 
rugged,  practical  devices. 
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ABSTRACT 

An  analysis  was  done  on  an  acousto-optic  signal  processor  in 
order  to  make  recommendations  about  possible  improvements  in  the 
algorithm  being  used.  The  optical  nature  of  the  processor 
necessitates  looking  at  data  in  an  analog,  rather  than  digital, 
fashion.  The  minimization  problem,  which  the  processor  is 
solving,  is  formulated  in  an  analog  way.  This  leads  to  an 
operator  equation  in  Hilbert  space,  rather  than  the  usual  matrix 
equation.  Operator  theoretic  versions  of  steepest  descent  and 
conjugate  gradient  algorithms  are  discussed.  Block  diagrams  are 
given  for  these  algorithms,  along  with  recommendations  for 
possible  optical  implementations. 


I.  Introduction 


My  Ph.D.  dissertation  topic  at  Purdue  University  concerned 
the  numerical  solution  of  integral  equations.  After  several 
years  of  working  in  that  area,  I  decided  to  branch  out  into  other 
areas,  and  began  looking  at  optical  signal  processing  and  image 
processing.  Through  the  Center  for  Applied  Cities  at  the 
University  of  Alabama  in  Huntsville  I  was  able  to  do  funded 
research  in  optics  in  1985. 

The  research  problem  I  selected  at  the  Rone  Air  Development 
Center,  Griffiss  AFB,  NY,  concerns  the  investigation  of  different 
mathematical  algorithms  for  implementation  on  an  acousto-optic 
signal  processor.  Because  of  the  analog  nature  of  the  optics 
involved,  the  problem  to  be  solved  turns  out  to  be  an  integral 
equation.  The  combination  of  optics  and  integral  equations  makes 
this  problem  particularly  well  suited  to  my  background. 


II.  Objectives  of  the  Research  Effort 

The  objective  of  the  research  effort  is  to  improve  the 
performance  of  an  acousto-optic  signal  processor  (already  in 
experimental  operation)  by  iitpleroenting  a  more  efficient 
mathematical  algorithm.  The  system  in  operation  now  uses  a 
Hcwells-Applebaum  least  mean  square  (IMS)  algorithm.  It  was  felt 
that  performance  could  be  improved  if  a  more  powerful  algorithm, 
such  as  the  conjugate  gradient  algorithm,  were  implemented.  My 
individual  objectives  were: 

1.  Familiarize  myself  with  the  acousto-optic  processor  new  in 
operation  in  order  to  fully  understand  its  implementation  of  the 
LMS  algorithm. 

2.  Formulate  the  mathematical  problem  to  be  solved,  keeping  in 
mind  the  special  nature  of  the  optical  processing  involved. 

3.  Study  the  conjugate  gradient  and  related  algorithms  and 
investigate  the  feasibility  of  implementing  these  algorithms  in 
an  optical  system  similar  to  the  one  now  in  operation. 

4.  Construct  a  block  diagram  for  the  conjugate  gradient 
algorithm  and  make  recommendations  about  possible  optical 


implementations 


III.  Analog  Ebrmulation  of  the  Minimization  Problem  for  the 
Optical  System 

We  consider  the  signal  processing  problem  of  cancelling 
noise  from  a  main  signal.  The  receiving  configuration  consists 
of  a  main  antenna  and  N  cmni-directional  side  antennas.  An 
aoousto-optic  processor  for  such  a  system  has  been  proposed  and 
implemented  by  Vannicola  and  Penn  [VP1,VP2] .  A  similar  system 
has  been  considered  by  Vander  Lugt  tvij . 

We  denote  by  n^tt)  ,n2(t) , .. .  ^(t)  the  signals  received  at 
the  side  antennas  at  time  t,  and  by  s(t)  +  n(t)  the  main  signal 
plus  noise  received  at  the  main  antenna.  Each  side  channel 
signal  n.  (t)  is  input  through  an  acousto-optic  device  which 
produces  a  continuum  of  delayed  signals  n.(t-x),  where  the  delay 
x  ranges  fran  0  to  a  value  R  which  depends  on  the  acousto-optic 
device  (R  is  typically  in  the  range  5-50  microseconds) .  We  call 
x  a  'spatial'  variable  here,  since  it  represents  position  across 
the  acousto-optic  device,  although  it  can  also  be  thought  of  as 
another  time  variable. 


Our  problem  is  to  form  a  weighted  combination  of  the  delayed 
secondary  signals  n^t-x)  in  such  a  way  that  the  result  is  a  good 
estimate  of  the  noise  n(t)  received  at  the  main  channel.  The 
continuous  nature  of  the  delays  necessitates  that  we  look  at  this 
problem  in  an  analog  way,  rather  than  the  usual  discrete 
formulation  involving  matrices  and  vectors.  Thus,  we  define  a 
Hilbert  space  H  consisting  of  the  set  of  oanplex  vector  valued 
functions  lh(x)  «  (h^(x)  ,...,h^(x))  defined  on  the  real  interval 
[0,R]  with  inner  product 


hi(x)gi(x)  dx 


where  for  a  conplex  variable  z,  z  denotes  its  complex  conjugate. 


Define  the  functions  of  t wo  variables  f.(x,t)  *  n.(t-x),  i  = 

^  *  t-h 

and  let  £(x,t)  be  the  vector  valued  function  whose  i 
component  is  fj(x,t).Our  problem,  then,  is  to  determine  a  vector 
valued  weight  function  w(x)  =  (w^(x) , . . . ,wN(x) )  so  that  the 

scalar  function 

y(t)  =  (f(.,t),w)  (3.1) 

is  a  good  approximation  of  the  noise  n(t). 


The  output  of  the  system  is  the  "error"  signal  e(t),  which 
is  the  main  signal  plus  noise  minus  the  estimated  noise: 


e(t)  =  s(t)  +  n(t)  -  y (t) . 


The  quantity  we  wish  to  minimize  is 


E(|e(t) 


2 

dt. 


(3.2) 


E  can  be  thought  of  as  an  expected  value  over  time,  although  for 
the  purposes  of  our  minimization  problem  we  have  emitted  any 
reference  to  a  probability  density  function.  (Che  can  also  think 
of  (3.2)  as  an  "energy"  integral.)  Setting  d(t)  *  s(t)  +  n(t),  we 
find 


|e(t)  |  *  e(t)e(t) 

*  (d(t)  -  y(t))(d(t)  -  y(t) ) 

2  2 
«  |d(t)|  -  y(t)d(t)  -  d(t)y(t)  +  |y(t)|  . 

Then 


E(y(t)d(t)) 


OO 

\  y(t)i!(t)  at 


f4(x,t)w.  (x)  dx  \  dt 


<j-l*  3 


■] 


Ejfl  fj  (x,t)<3(t)  dt^  wJfxT  dx 
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(b,w) 


where 


Similarly, 


/  _  '\ 

t)d(t)  dt 


b(x)  = 


\f  (x,l 
1 


(**  ’ 
)  f  (x,t 


t)d(t)  dt 


0  N 


V 


J 


E(y(t)d(t))  *  E(y (t)d (t) ) 


(b,w) 

(w»b). 


Also,  using  (3.1),  we  find 
2 

E<  |y(t>  |  )  •  \  y(t)y(t)  dt 
'8 


E  ^  £j  (x)  dx^  U  ( .  ,t)  ,w)  dt 

sc  •iw  ff  (f  (.,t),w)  dt^  dx 


«  (Aw,w) , 

where  A  is  an  operator  mapping  H  to  H,  defined  by 
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1 


Aw(x)  = 


_ _ 

}  fN(x,t)  (T(T7t)  ,w)  dt 


The  j  n  component  of  Aw  can  be  rewritten  as 


N  /R  _ 

V  f.:(x,t)2I;\  fk(s,t)wk(s)  ds  dt  = 

'0  J  k*l'0 

=12^  w.  (s)\  f.  (x,t)£.  (s  7 1) 
k=l)0  K  J0  3  * 


dt  ds 


-  (W»Aj (X, • ) ) , 


where  A  -  (x,s)  is  a  vector  valued  function  of  two  spatial 


variables  whose  k^  component  is  given  by 


f  , _ 

(x,s)  =  j  fR (s,t)£j (x ft)  dt 


for  k  *  1  , « • . jN . 


The  operator  A  can  be  thought  of  as  an  analog  "outer 
product".  It  corresponds  to  the  cross-correlation  (or 
covariance)  matrix  in  the  usual  discrete  formulation  of  the 
problem. 


It  is  straightforward  to  shew  that  for  functions  w,  u  in  H 


we  have 


(Aw,u)  *  (w,Au) 


so  that  A  is  a  self  adjoint  operator  on  H.  Also,  one  can  shew 


I 


(^,U) 


f 


|(u,f(.,t))|  dt. 


(3.3) 


The  expression  on  the  right  side  of  (3.3)  is  >  0  for  u  £  0,  thus 
A  is  a  positive  operator  on  H. 


Cur  minimization  problem  can  thus  be  reformulated  as  the 
problem  of  minimizing  the  functional  F  defined  by 


2 

F(w)  =  E(  |d  (t)  |  )  -  (b,w)  -  (w,b)  +  (Aw,w)  (3.4) 


The  right  side  of  (3.4)  is  just  E( |e(t)  |2)  (see  (3.2)).  The 
quantity  E(|d(t)  |  )  is  independent  of  w  and  is  of  no  consequence 
in  the  minimization  problem.  Since  A  is  a  positive  operator,  the 
problem  of  minimizing  (3.4)  is  equivalent  to  solving  the  operator 
equation 

Aw  *  b  (3.5) 

([M,  Theorem  2.1]).  Equation  (3.5)  can  also  be  written  as  a 
coupled  system  of  integral  equations: 

S  C  w  (s)A..  (x,s)  ds  «  b.(x),  j  *  1,...,N  (3.6) 

k*l  >2  K  3K  3 

IV.  The  Bcisting  Architecture 

Before  examining  any  new  algorithms  for  the  solution  of 
(3.5),  let  us  first  look  at  what  the  existing  optical  system 
(reported  in  lVPl,VP2))  is  doing.  Figure  1  is  a  simplified 
diagram  of  this  system,  showing  one  side  channel  only. 

This  architecture  is  implementing  the  US  algorithm,  which 
is  an  approximate  version  of  the  method  of  steepest  descent. 
Assume,  for  the  moment,  that  we  are  receiving  signal  samples  at 
discrete  time  intervals  t  *  iAt  for  same  fixed^t.  Consider  the 
following  iterative  scheme  for  determining  the  weight  function 
w(x)  (one  side  channel  only): 

wi+i(x)  =  w.(x)  +  a.p.(x). 
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(4.1) 


nj(t) 


-  y(t) 


Figure  1 

BLOCK  DIAGRAM  FOR  CURRENT  SYSTEM 


The  functions  p. (x)  are  direction  ’vectors’,  a.  are  scalars,  and 

til  ^ 

w^(x)  is  the  i  iterative  approximation  of  w(x).  The  method  of 

steepest  descent  uses,  for  p^(x),the  negative  gradient  of  the 

functional  F,  defined  by  (3.4),  evaluated  at  w^(x).  This 

gradient  is  -2r^(x)  (cf.,  [L])  where 

ri(x)  =  b{x)  -  Awi(x)  (4.2) 

is  the  1th  residual. 

2 

The  LMS  method  uses  the  gradient  of  |e.j  ,  instead  of  the 

2  * 

gradient  of  F(w)  «  E{ le^J  > ,  where  e.^  *  e(iAt).  This  gradient 
can  be  computed  as  -2e^n^(i&t-x) .  The  algorithm  thus  becomes 

wi+i(x)  «  w.(x)  +  aie^n^(i^t-x)  (4.3) 

where  we  have  absorbed  the  factor  2  as  part  of  a^.  Notice  that 
now  we  do  not  have  to  compute  the  operator  A,  as  is  necessary  in 
determining  the  residual  defined  by  (4.2). 


0 


The  iterative  scheme  given  by  (4.3)  can  be  easily  solved  by 
observing  that 

K-l 

JZ  wi+l(x)  “  wi(x)  a  ■*  w0(x)‘ 

Taking  w0(x)  *  0,  we  obtain  frcm  (4.3) 

K-l 

wR(x)  =  JZ  a.e^UAt-x) .  (4.4) 

We  now  make  the  assumption  that  the  step  size  a^  (or  "beam 
steering  signal"  (MM])  has  been  incorporated  into  the  signal  n^. 
Letting  &  t  0,  we  get  the  analog  version  of  (4.4) ,  namely 

w(x)  *  (  e(s)n.(s-x)  ds.  (4.5) 

This  is  the  first  integral  which  appears  in  Figure  1.  It  can 
also  be  interpreted  as  a  correlation  of  n^  with  the  “error" 
signal  e(t) . 

The  advantage  of  this  implementation  is  its  simplicity. 
There  is  no  iteration  loop,  rather,  the  iteration  scheme  has  been 
solved  directly,  and  an  expression  for  the  solution  implemented. 
Also,  the  problem  of  oonputing  the  operator  A  has  been  avoided 
completely. 

The  disadvantage  is  that  it  may  not  produce  an  accurate 
solution  for  w(x).  The  method  of  steepest  descent  typically  can 
have  very  slow  convergence,  and  one  would  expect  this  IMS  method 
to  be  even  slower. 


V.  The  Oon jugate  Gradient  Algorithm 


We  now  consider  another  iterative  method  for  solving 
equation  (3.5).  We  return  to  the  iteration  equation  (4.1),  which 
we  now  write  for  the  case  of  multiple  side  channels,  so  that  w^ 


(x)and  £^(x)  are  elements  of  the  Hilbert  space  H: 
wi+l<x)  *  w.(x)  +  a.  £i(x). 

We  now  choose  the  direction  vectors  £^(x)  to  be  a  set  of  linearly 
independent,  A-orthogonal  vectors,  ie.,  the£^(x)  are  such  that 

(£i»A£j)  *  0  for  i  #  j.  (5.1) 

The  scalar  a^  is  chosen  at  each  step  of  the  iteration  process  to 
minimize  the  value  of  F(w^).  The  iteration  method  is  then  said 
to  be  a  conjugate  directions  method  (the  use  of  the  word 
'conjugate'  here  comes  from  the  fact  that  vectors  satisfying 
(5.1)  are  said  to  be  A-conjugate) . 

If  one  chooses  as  the  vectors  £^(x)  the  A-orthogonalized 
residuals  r^(x)  =  b(x)  -  ta^x)  then  one  obtains  the  conjugate 
gradient  method.  This  method  can  be  summarized  by  the  following 
iteration  scheme  (cf.,[LJ): 

2WX)  =-Si(x)  +aiEi<x> 

Ehi(x)  *  ii+l(x>  '  °i  Ei(x> 

ai  *  <5-2> 

°1  "  <Ii^Ei'/<Ei-sEi> 

r^(x)  =  b(x)  -  Aw^ (x) . 

The  value  of  a^  comes  from  minimizing  F(w^),  and  the  value  of  c^ 
canes  from  A-orthogonalizing  the  vectors  r^(x).(It  is  a 
nontrivial  property  of  this  method  that  one  need  only 
A-orthogonalize  £^+1(x)  with  respect  tO£^(x),  and  not  all  the 
preceding  £j(x)’s,  to  obtain  a  canplete  A-orthogonal  set  (cf. 
[L]).) 

What  is  the  motivation  for  considering  conjugate  direction 
met-  -ds?  Che  reason  is  the  following  fact.  Suppose  we  have  some 
weight  value  w^(x)  and  we  compute  a  new  value  w^x)  fran 


«l(x)  *  Wgfx)  +  a0  Egtx) 

where  £q(x)  is  any  nonzero  direction  vector  and  ag  is  chosen  to 
minimize  F(w^)  (so  ag  is  given  by  the  expression  in  (5.2)  with 
i*0).  Now  suppose  w*(x)  is  the  true  solution  of  (3.5).  Then  the 
oorrect  direction  to  go  in,  in  order  to  reach  exactly  w*  on  the 
next  step,  is  always  going  to  be  A-orthogonal  to  the  previous 
direction  vector  used  (in  this  case,  ^(x)).  To  see  this,  note 
that  the  direction  fran  to  w*  is  w*  -  w^  and 

*  (£g»b  -  Awx) 

-  (Ej-b  -  +  »,£,)) 

■  <£«'£  -*£«>-  »«<£*'*£*> 

(a0  is  real) 

-  <£,'£«>  -  a0(E0'AE0> 

»  0  from  (5.2). 

In  the  discrete  case,  when  A  is  a  finite  dimensional  matrix, 
there  are  only  finitely  many  directions  which  are  A-orthogonal  to 
a  given  vector.  Conjugate  direction  methods  search  through  this 
finite  list  until  exactly  the  right  direction  vector  is  found. 
They  are  thus  guaranteed  to  converge  to  the  exact  solution 
(ignoring  roundoff  errors)  in  a  finite  number  of  steps.  In 
contrast,  if  the  method  of  steepest  descent  does  not  obtain  the 
exact  solution  in  one  step,  then  it  will  always  take  infinitely 
many  steps  to  reach  the  exact  solution  ((LSJ). 

we  are  not  dealing  with  a  finite  dimensional  matrix,  but 
rather  with  an  "infinite  dimensional"  operator  A,  so  the  finite 
step  advantage  mentioned  above  is  not,  in  general,  applicable  to 
our  situation  (there  are,  however,  cases  when  finite  convergence 
is  attained  for  an  operator  A  (cf.  [LS]).  However,  the 
conjugate  gradient  method  will  always  converge  more  rapidly  than 
the  method  of  steepest  descent  (see  [D]  for  estimates  on  the  rate 
of  convergence) . 
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VI.  Block  Diagrams 

In  this  section,  block  diagrams  for  two  iterative  methods 
are  presented.  These  diagrams  are  constructed  with  optical 
implementation  in  mind  (eg.,  there  is  no  storage  of  data  or 
previously  computed  results).  Boxes  labeled  "compute  A", 
"compute  b”,  etc.,  cure  indicated  in  detail  in  separate  diagrams. 

Figure  2  shows  a  diagram  for  the  method  of  steepest  descent. 
This  method  is  included  here  because  it  is  simpler  to  implement 
than  the  conjugate  gradient  algorithm,  yet  it  contains  most  of 
the  computational  difficulties  (computing  A,  b,  inner  products, 
and  inverting  scalars) .  If  this  method  can  be  implemented 
optically,  then  it  would  be  relatively  straightforward  to  modify 
the  resulting  system  for  the  conjugate  gradient  algorithm. 

Figure  6  shows  the  diagram  for  the  conjugate  gradient 
algorithm.  As  one  can  see,  it  contains  all  of  the  computations 
required  by  steepest  descent,  plus  additional  computations 
required  for  obtaining  the  vectors  (x) .  Since  we  have  not 
assumed  the  possibility  of  storing  previously  computed  results, 
we  must  compute  both  r^  and  r^+^  in  each  iteration  loop.  Each  of 
these  residual  conputations  requires  the  computation  of  A  and  b 
(see  Figures  3-4) . 
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rij  (t) — ^ — I  Co,"Jute  f 

u"r'J 

A£j(x)- 


£,(*> 


Inner 

Product 


Xf+i(x)1p^ctr(Iiti,A£,) 


(£1-»A£i) — > -  Invert 


(£,  »A£i)_1 


ct  s  (li+1.A£i)/(£i,A£.) 


Figure  7.  COMPUTING  ci 


A-17 


VII.  Recommendations 


In  order  to  achieve  an  optical  implementation  of  either  of 
the  iterative  methods  discussed  in  the  previous  section,  we  must 
first  be  able  to  implement  the  individual  operations  shown  in  the 
block  diagrams.  The  major  operations  are: 

1.  Ccnputation  of  the  operator  A  (Figure  3) .  This  requires  two 
acousto-optic  cells  and  space  and  time  integration.  In  the 
current  system,  space  integration  is  done  with  a  lens,  and  time 
integration  with  a  liquid  crystal  light  valve  (LCLV) .  In  fact, 
all  the  operations  needed  to  compute  A  are  done  in  the  current 
system,  so  this  should  present  no  problem.  Since  A  essentially 
represents  an  "outer  product",  reference  [A],  which  discusses 
optical  computation  of  outer  products,  may  be  helpful. 

2.  Computation  of  b  (Figure  4).  This  requires  another 
acousto-cptic  cell,  and  a  time  integration  (LCLV) . 

3.  Inner  Products.  Each  inner  product  requires  complex 
conjugation,  pointwise  multiplication,  space  integration,  and 
sunmation.  Also,  both  vectors  will  be  represented  as  light,  so 
an  acousto-cptic  cell  (which  has  one  electronic  input)  may  not  be 
appropriate.  An  efficient  optical  means  will  have  to  be  found  to 
cxmpute  these  inner  products. 

4.  Inverting  Scalars.  This  may  be  the  hardest  operation  to 
implement  optically.  It  may  require  its  own  iterative  loop. 

The  iterative  loops  involved  in  both  the  steepest  descent 
and  conjugate  gradient  algorithms  are  a  major  departure  fran  the 
existing  optical  system.  There  are  two  alternatives  to 
approaching  the  implementation  of  these  loops,  both  having  to  do 
with  the  idea  that  the  weights  are  supposed  to  be  slowly  varying 
with  time. 

The  first  approach  would  be  to  consider  taking  in  data  in 
blocks,  rather  than  continuously,  and  doing  a  set  number  of 
iterations  on  each  block  of  data  to  determine  the  weights.  The 
value  of  the  weights  would  be  updated  as  each  block  of  data  comes 
in.  This  would  be  a  true  implementation  of  the  algorithms  as 
outlined  above. 
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A  second  approach,  which  requires  further  analysis  but  would 
be  easier  to  implement,  would  be  to  do  just  one  iteration  as  part 
of  the  existing  loop.  Since  the  weights  are  assumed  "constant" 
in  time,  this  would  have  the  effect  of  many  iterations  as  new 
data  is  continuously  brought  in  and  sent  through  the  system. 
Also,  it  would  be  "adaptive",  just  as  the  present  system  is,  in 
that  changes  in  the  data  should  eventually  be  reflected  in 
changes  in  the  weights.  This  implementation  would  not  too 
different  from  the  existing  system,  but  further  analysis  is 
needed  to  determine  if  the  algorithms  are  still  valid  when  the 
iterations  are  slowly  varying  in  time. 

My  recommendation  is  that  further  analysis  of  the  second 
approach  mentioned  above  be  carried  out.  It  should  be  compared 
to  the  first  approach,  ie.,  the  standard  implementation  of  the 
algorithms.  A  computer  simulation  study  comparing  both  should  be 
done.  If  an  optical  implementation  seems  feasible,  it  should  be 
carried  out  for  the  steepest  descent  method  first,  since  most  of 
the  computational  difficulties  are  encountered  there.  The 
conjugate  gradient  algorithm  can  be  inplemeted  as  a 
straightforward  modification  of  steepest  descent. 
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